AITopics | adversarial attack

Frequency-Domain Regularized Adversarial Alignment for Transferable Attacks against Closed-Source MLLMs

Yuan, Leitao, Mao, Qinghua, Liu, Daizong, Wang, Kun, Wang, Wenjie, Teng, Yan, Shao, Jing, Liu, Dongrui

arXiv.org Machine LearningMay-22-2026

Multimodal large language models (MLLMs) remain vulnerable to transfer-based targeted attacks, where perturbations optimized on open-source surrogate encoders can generalize to closed-source MLLMs. A key challenge for improving adversarial transferability is to effectively capture the intrinsic visual focus shared across different models, such that perturbations align with transferable semantic cues rather than surrogate-specific behaviors. However, existing methods suffer from spatial-domain feature redundancy and surrogate-specific gradient signals, thereby hindering cross-model transferability. In this paper, we propose FRA-Attack, which addresses both challenges from a unified frequency-domain regularization perspective. For feature alignment, a high-pass DCT objective on patch features suppresses redundant global structures and concentrates the loss on the high-frequency band that carries the MLLMs' intrinsic visual focus. For gradient optimization, we introduce Frequency-domain Gradient Regularization (FGR), a \textit{model-agnostic} low-pass regularizer that modulates the surrogate gradient using only the geometric frequency coordinate, \textit{i.e.}, no surrogate-derived statistic is involved, so that FGR is model-agnostic by construction, removing surrogate-specific high-frequency artifacts while preserving transferable low-frequency directions. Together, the two components form a unified frequency-domain treatment of transferability. Extensive experiments on $15$ flagship MLLMs across $7$ vendors show that FRA-Attack achieves superior cross-model transferability, particularly with state-of-the-art performance on GPT-5.4, Claude-Opus-4.6 and Gemini-3-flash.

large language model, machine learning, natural language, (20 more...)

arXiv.org Machine Learning

2605.21541

Genre: Research Report (0.64)

Industry: Information Technology > Security & Privacy (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

d9888cc7baa04c2e44e8115588133515-Paper-Conference.pdf

Neural Information Processing SystemsMay-1-2026, 04:55:43 GMT

adversarial robustness, artificial intelligence, machine learning, (19 more...)

Neural Information Processing Systems

Country: Asia (0.28)

Genre: Research Report > New Finding (0.45)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

16d11e9595188dbad0418a85f0351aba-Supplemental.pdf

Neural Information Processing SystemsMay-1-2026, 01:50:51 GMT

This section introduces more backgrounds on poisoning attacks and backdoor attacks, and details on the adversarial attacks that we use to craft accumulative poisoning samples in our methods. Finally, we describe the commonly used anomaly detection methods against adversarially crafted samples, following previous settings [40]. B.1 Poisoning attacks and backdoor attacks There is extensive prior work on poisoning attacks, especially in the offline settings against SVM [3], logistic regression [36], collaborative filtering [27], feature selection [54], clustering [8], and neural networks [9, 21, 22, 38, 50]. Poisoning attacks in real-time data streaming are studied on online SVM [4], autoregressive models [1, 7], bandit algorithms [20, 31, 33], and classification [26, 52, 57]. Compared to poisoning attacks, backdoor attacks draw attention in more recent researches.

artificial intelligence, arxiv preprint arxiv, machine learning, (10 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.48)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

Backpropagating Linearly Improves Transferability of Adversarial Examples (Supplementary Material)

Neural Information Processing SystemsApr-30-2026, 19:57:26 GMT

Empirical results in Section 3.1 in the main paper show that simply removing ReLUs lead to improved transferability. In this section, we try freezing all learnable parameters in the unmodified sub-net h during fine-tuning and a similar observation about the initial improvement of transferability can still be decrease made and (see finally Figure the 5). Classification loss of these modified VGG-19 models on the benign CIFAR-10 test set is also reported, in Figure 6. On ImageNet, it is evaluated on the 50000official validation images. As mentioned in the main paper, many recent successes in improving adversarial transferability benefit from maximizing intermediate level distortions rather than the final prediction losses [8, 3, 2] of DNNs.

artificial intelligence, machine learning, source model, (16 more...)

Neural Information Processing Systems

Country: North America (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.68)

Add feedback

Backpropagating Linearly Improves Transferability of Adversarial Examples

Neural Information Processing SystemsApr-30-2026, 19:57:19 GMT

The vulnerability of deep neural networks (DNNs) to adversarial examples has drawn great attention from the community. In this paper, we study the transferability of such examples, which lays the foundation of many black-box attacks on DNNs. We revisit a not so new but definitely noteworthy hypothesis of Goodfellow et al.'s and disclose that the transferability can be enhanced by improving the linearity of DNNs in an appropriate manner. We introduce linear backpropagation (LinBP), a method that performs backpropagation in a more linear fashion using off-the-shelf attacks that exploit gradients. More specifically, it calculates forward as normal but backpropagates loss as if some nonlinear activations are not encountered in the forward pass. Experimental results demonstrate that this simple yet effective method obviously outperforms current state-of-the-arts in crafting transferable adversarial examples on CIFAR-10 and ImageNet, leading to more effective attacks on a variety of DNNs.

adversarial example, artificial intelligence, machine learning, (18 more...)

Neural Information Processing Systems

Country: North America > United States (0.46)

Genre: Research Report > New Finding (0.34)

Industry:

Information Technology > Security & Privacy (0.50)
Government > Military (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

f5846131aa6a72d1df3bd6d43a4a960b-Supplemental-Conference.pdf

Neural Information Processing SystemsApr-30-2026, 07:58:31 GMT

artificial intelligence, detector, machine learning, (19 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.95)

Add feedback

ab5a2bf4385bee44f3919060b184605b-Paper-Conference.pdf

Neural Information Processing SystemsApr-29-2026, 08:57:38 GMT

artificial intelligence, explanation, machine learning, (17 more...)

Neural Information Processing Systems

Country: North America > United States > California (0.28)

Industry: Information Technology > Security & Privacy (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Vision (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Adv-Attribute: Inconspicuous and Transferable Adversarial Attack on Face Recognition

Neural Information Processing SystemsApr-28-2026, 02:36:01 GMT

Deep learning models have shown their vulnerability when dealing with adversarial attacks. Existing attacks almost perform on low-level instances, such as pixels and super-pixels, and rarely exploit semantic clues. For face recognition attacks, existing methods typically generate the ℓp-norm perturbations on pixels, however, resulting in low attack transferability and high vulnerability to denoising defense models. In this work, instead of performing perturbations on the low-level pixels, we propose to generate attacks through perturbing on the high-level semantics to improve attack transferability. Specifically, a unified flexible framework, Adversarial Attributes (Adv-Attribute), is designed to generate inconspicuous and transferable attacks on face recognition, which crafts the adversarial noise and adds it into different attributes based on the guidance of the difference in face recognition features from the target. Moreover, the importance-aware attribute selection and the multi-objective optimization strategy are introduced to further ensure the balance of stealthiness and attacking strength. Extensive experiments on the FFHQ and CelebA-HQ datasets show that the proposed Adv-Attribute method achieves the state-of-the-art attacking success rates while maintaining better visual effects against recent attack methods.

artificial intelligence, computer vision and pattern recognition, machine learning, (14 more...)

Neural Information Processing Systems

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision > Face Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

fb4c48608ce8825b558ccf07169a3421-Supplemental.pdf

Neural Information Processing SystemsApr-27-2026, 22:53:20 GMT

In this section, we perform additional diagnostics that give us confidence that our models are not doing any form of gradient obfuscation or masking [3, 53]. First, we report in Table 4 the robust accuracy obtained by our strongest models against a diverse set of attacks. The cascade is composed as follows: AUTOPGD-CE, an untargeted attack using PGD with an adaptive step on the cross-entropy loss [10], AUTOPGD-T, a targeted attack using PGD with an adaptive step on the difference of logits ratio [10], FAB-T, a targeted attack which minimizes the norm of adversarial perturbations [9], SQUARE, a query-efficient black-box attack [1]. First, we observe that our combination of attacks, denoted AA+MT matches the final robust accuracy measured by AUTOATTACK. Second, we also notice that the black-box attack (i.e., SQUARE) does not find any additional adversarial examples.

accuracy, artificial intelligence, robust accuracy, (17 more...)

Neural Information Processing Systems

Industry: Transportation > Air (0.55)

Technology: Information Technology > Artificial Intelligence (0.70)

Add feedback